Date : 12/02/2021
Written by Byungkyu Lee (bklee009@gmail.com)

# README File for replication of all figures and tables reported in 
Lee, Byungkyu, Kai-Cheng Yang, Patrick Kaminski, Peng Siyun, Meltem Odabas, Gupta Sumedha, Hank Green, Yong-Yoel Ahn, and Brea Perry. “Substitution of Non-Pharmacologic Therapy with Opioid Prescribing for Pain During the COVID-19 Pandemic.” JAMA Network Open. 2021; 4(12):e2138453

This file contains Stata Do files and R codes to reproduce all figures and tables appeared in the main text as well as Appendix. We are no able to share individual-level raw data according to the data use agreement. More information here: https://ssrc.indiana.edu/data/optum.html Although we are not able to share the raw data, we share our codes to produce all figures and tables from the scratch.

We use R and Stata for our main data processing and analysis. We use Snakemake to run all different codes to process multiple data files, and please visit Snakemake website (https://snakemake.readthedocs.io/en/stable/) for more information.

## Directory Structure
Once you specify `bucket` as your raw data directory in R codes and `root_dir` in Snakemake file, you will be able to reproduce all these results. 

## Data
Although we are not able to include the raw claims data from Optum due to data restrictions, we include the following data sets that can be used to replicate our paper.

1) Claims data : We processed individual-level claims quarterly data and aggregate them using `code/4_data_process/3_combine_aggregate_data.R` : `ses_division_weekly_covid_0101_0930.fst`

2) additional data
* `additional_data/drugs.tsv` : a list with NDC codes for opioids and other stimulants 
* `additional_data/Drugs/controlled_substance_NDC.csv` : a list with NDC codes for controlled substance 
* `additional_data/pain_code/pain_dx_list.csv` : all pain diagnosis (ICD9/10) from Crosswalk of ICD-9-CM to ICD-10-CM diagnostic codes for common pain conditions : https://github.com/PainResearch/PainCondition_ICD9CM_ICD10CM_Crosswalk
* `additional_data/pain_therapy/pain_therapy_CPT.csv` : a set of codes to identify non opioid pain therapies.

3) processed data
* `processed_data/ses_division_weekly_covid_0101_0930.fst` includes weekly data aggregated at census division across demographic categories.

## Code 
To replicate results, you should run all the codes in the following order. 

0. transform csv file to fst file (see also https://www.fstpackage.org/)
* run `code/0_create_fst_format/convert_v4_to_fst_SES.smk`

1. generate weekly data from quarterly data
* first generate daily data from quarterly data : run `code/1_generate_weekly_format/0_daily/1_convert_files_daily.smk`
* and then combine daily data to generate weekly data : run `code/1_generate_weekly_format/1_weekly/1_convert_files_weekly_2019_from_daily.smk` and `code/1_generate_weekly_format/1_weekly/1_convert_files_weekly_2020_from_daily.smk`

2. create weekly location file (i.e. census division) ... this step is important since patients actually may be able to move within each time window
* first split quarterly data into weekly data : run `code/2_geo_process/1_split_mbr_to_weekly.R`
* and then identify location (i.e., census division) based on the length of their stay in each location : run `code/2_geo_process/2_geo_process_weekly.smk`

3. create patient-level indicators : pain patients, opioid prescriptions, non-opioid pain therapies 
* run `code/3_patient_indicators/SES_process_patient_indicators_weekly.smk` file

4. combine processed data
* run `code/4_combine_data/2_generate_regdata_covid_opioid_weekly.smk` file to generate weekly data with regression table format that includes all patient indicators + geo location + socio-demographic variables in addition to opioid-therapy transition files.
* run `code/4_combine_data/3_combine_aggregate_data.R` to generate weekly data aggregated at census division levels which later are used to produce Figure 1 and 2.
* run `code/4_combine_data/3_combine_individual_data.R` to combine all individual-level data, which later are used to produce individual-level pain treatment transition tables.

5. analyze data and produce final figures/tables.
Once you have all these files, you can run the following code to replicate our results.

You need the raw data to produce the following tables/figures:
* table s1 : run `1_table_s1.R`
* table s2 : run `1_table_s2.R`
* table s4 : run `1_table_s4.R`
* table s5 : run `1_table_s5.R`
* table s6 and s7 : first run `2_export_data_to_stata.R`, and run `3_calculate_transition.do`  and then run `5_table_s6_s7.R`
* figure 3 and table s8 [but please see `diff_transition_period.txt` file under `processed_data` folder], first run `2_export_data_to_stata.R`, and run `3_calculate_transition.do`  and then run `5_figure3_table_s8.R`

You can produce the following figures using `ses_division_weekly_covid_0101_0930.fst` files:
* figure 1 : run `4_figure_1.R`
* figure 2 : run `4_figure_2.R`
* figure s1 : run `4_figure_s1.R`

## Session Info

We use Stata MP17 and R 4.0.4 to run the analysis.

```
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.4 (Ootpa)

Matrix products: default
BLAS:   /geode2/soft/hps/rhel8/r/4.0.4/lib64/R/lib/libRblas.so
LAPACK: /geode2/soft/hps/rhel8/r/4.0.4/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggalluvial_0.12.3  ggsci_2.9          forcats_0.5.1      dplyr_1.0.6       
 [5] purrr_0.3.4        readr_1.4.0        tidyr_1.1.3        tibble_3.1.2      
 [9] tidyverse_1.3.1    zoo_1.8-9          ggpubr_0.4.0       ggplot2_3.3.3     
[13] tsibble_1.0.1      vroom_1.5.3        logger_0.2.1       stringr_1.4.0     
[17] future.apply_1.7.0 future_1.21.0      fst_0.9.4          data.table_1.14.0 
[21] bit64_4.0.5        bit_4.0.4          rio_0.5.26        

loaded via a namespace (and not attached):
 [1] httr_1.4.2        jsonlite_1.7.2    carData_3.0-4     modelr_0.1.8     
 [5] assertthat_0.2.1  cellranger_1.1.0  globals_0.14.0    pillar_1.6.1     
 [9] backports_1.2.1   lattice_0.20-41   glue_1.4.2        digest_0.6.27    
[13] ggsignif_0.6.1    rvest_1.0.0       colorspace_2.0-1  pkgconfig_2.0.3  
[17] broom_0.7.6       listenv_0.8.0     haven_2.4.1       scales_1.1.1     
[21] openxlsx_4.2.3    tzdb_0.1.2        generics_0.1.0    car_3.0-10       
[25] ellipsis_0.3.2    withr_2.4.2       cli_2.5.0         magrittr_2.0.1   
[29] crayon_1.4.1      readxl_1.3.1      ps_1.6.0          fs_1.5.0         
[33] fansi_0.4.2       parallelly_1.25.0 anytime_0.3.9     rstatix_0.7.0    
[37] xml2_1.3.2        foreign_0.8-81    tools_4.0.4       hms_1.1.0        
[41] lifecycle_1.0.0   munsell_0.5.0     reprex_2.0.0      zip_2.1.1        
[45] compiler_4.0.4    rlang_0.4.11      grid_4.0.4        rstudioapi_0.13  
[49] gtable_0.3.0      codetools_0.2-18  abind_1.4-5       DBI_1.1.1        
[53] curl_4.3.1        R6_2.5.0          lubridate_1.7.10  utf8_1.2.1       
[57] stringi_1.6.2     Rcpp_1.0.6        vctrs_0.3.8       dbplyr_2.1.1     
[61] tidyselect_1.1.1 

```

If you have any question on these files or fail to reproduce any of figures or tables, then please direct your correspondence to BK Lee (bklee009@gmail.com). 

